NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Even Faster Algorithm for the Chamfer Distance

Feng, Ying; Indyk, Piotr (July 2025, Leibniz international proceedings in informatics)

Free, publicly-accessible full text available July 1, 2026
SparseCL: Sparse Contrastive Learning for Contradiction Retrieval

Xu, Haike; Lin, Zongyu; Sun, Yizhou; Chang, Kai-Wei; Indyk, Piotr (July 2025, Proceedings of Machine Learning Research)

Free, publicly-accessible full text available July 1, 2026
Graph-Based Algorithms for Diverse Similarity Search

Anand, Piyush; Indyk, Piotr; Krishnaswamy, Ravishankar; Mahabadi, Sepideh; Raykar, Vikas C; Shiragur, Kirankumar; Xu, Haike (July 2025, Proceedings of Machine Learning Research)

Free, publicly-accessible full text available July 1, 2026
Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions.

Indyk, Piotr; Kapralov, Michael; Sheth, Kshiteej; Wagner, Tal (May 2025, ICLR)

Free, publicly-accessible full text available May 1, 2026
Optimal and learned algorithms for the online list update problem with Zipfian accesses

Indyk, Piotr; Quaye, Isabelle; Rubinfeld, Ronitt; Silwal, Sandeep (February 2025, 36th International Conference on Algorithmic Learning Theory (ALT 2025))

The online list update problem is defined as follows: we are given a list of items and the cost to access any particular item is its position from the start of the list. A sequence of item accesses come online, and our goal is to dynamically reorder the list so that the aggregate access cost is small. We study the stochastic version of the problem where the items are accessed i.i.d. from an unknown distribution p. The study of the stochastic version goes back at least 60 years to McCabe. In this paper, we first consider the simple online algorithm which swaps an accessed item with the item right before it, unless it is at the very front. This algorithm is known as the Transposition rule. Wetheoretically analyze the stationary behavior of Transposition and prove that its performance is within 1 + o(1) factor of the optimal offline algorithm for access sequences sampled from heavy-tailed distributions, proving a conjecture of Rivest from 1976. While the stationary behavior of the Transposition rule is theoretically optimal in the aforemen tioned i.i.d setting, it can catastrophically fail under adversarial access sequences where only the last and second to last items are repeatedly accessed. A desirable outcome would be a policy that performs well under both circumstances. To achieve this, we use reinforcement learning to design an adaptive policy that performs well for both the i.i.d. setting and the above-mentioned adversarial access. Unsurprisingly, the learned policy appears to be an interpolation between Move-to-Front and Transposition with its behavior closer to Move-to-Front for adversarial access sequences and closer to Transposition for sequences sampled from heavy tailed distributions suggesting that the policy is adaptive and capable of responding to patterns in the access sequence.
more » « less
Free, publicly-accessible full text available February 24, 2026
Optimal and learned algorithms for the online list update problem with Zipfian accesses

Indyk, Piotr; Quaye, Isabelle; Rubinfeld, Ronitt; Silwal, Sandeep (February 2025, 36th International Conference on Algorithmic Learning Theory ALT 2025)

The online list update problem is defined as follows: we are given a list of items and the cost to access any particular item is its position from the start of the list. A sequence of item accesses come online, and our goal is to dynamically reorder the list so that the aggregate access cost is small. We study the stochastic version of the problem where the items are accessed i.i.d. from an unknown distribution p. The study of the stochastic version goes back at least 60 years to McCabe. In this paper, we first consider the simple online algorithm which swaps an accessed item with the item right before it, unless it is at the very front. This algorithm is known as the Transposition rule. We theoretically analyze the stationary behavior of Transposition and prove that its performance is within 1+o(1) factor of the optimal offline algorithm for access sequences sampled from heavy-tailed distributions, proving a conjecture of Rivest from 1976. While the stationary behavior of the Transposition rule is theoretically optimal in the aforementioned i.i.d setting, it can catastrophically fail under adversarial access sequences where only the last and second to last items are repeatedly accessed. A desirable outcome would be a policy that performs well under both circumstances. To achieve this, we use reinforcement learning to design an adaptive policy that performs well for both the i.i.d. setting and the above-mentioned adversarial access. Unsurprisingly, the learned policy appears to be an interpolation between Move-to-Front and Transposition with its behavior closer to Move-to-Front for adversarial access sequences and closer to Transposition for sequences sampled from heavy tailed distributions suggesting that the policy is adaptive and capable of responding to patterns in the access sequence.
more » « less
Free, publicly-accessible full text available February 24, 2026
Optimal Algorithms for Augmented Testing of Discrete Distributions

Aliakbarpour, Maryam; Indyk, Piotr; Rubinfeld, Ronitt; Silwal, Sandeep (December 2024, Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems (NeurIPS 2024))

We consider the problem of hypothesis testing for discrete distributions. In the standard model, where we have sample access to an underlying distribution p, extensive research has established optimal bounds for uniformity testing, identity testing (goodness of fit), and closeness testing (equivalence or two-sample testing). We explore these problems in a setting where a predicted data distribution, possibly derived from historical data or predictive machine learning models, is available. We demonstrate that such a predictor can indeed reduce the number of samples required for all three property testing tasks. The reduction in sample complexity depends directly on the predictor’s quality, measured by its total variation distance from p. A key advantage of our algorithms is their adaptability to the precision of the prediction. Specifically, our algorithms can self-adjust their sample complexity based on the accuracy of the available prediction, operating without any prior knowledge of the estimation’s accuracy (i.e. they are consistent). Additionally, we never use more samples than the standard approaches require, even if the predictions provide no meaningful information (i.e. they are also robust). We provide lower bounds to indicate that the improvements in sample complexity achieved by our algorithms are information-theoretically optimal. Furthermore, experimental results show that the performance of our algorithms on real data significantly exceeds our worst-case guarantees for sample complexity, demonstrating the practicality of our approach.
more » « less
Full Text Available
Statistical-Computational Trade-offs for Density Estimation

Aamand, Anders; Andoni, Alexandr; Chen, Justin Y; Indyk, Piotr; Narayanan, Shyam; Silwal, Sandeep; Xu, Haike (December 2024, Advances in neural information processing systems)

Full Text Available
Space-Optimal Profile Estimation in Data Streams with Applications to Symmetric Functions

Chen, Justin Y; Indyk, Piotr; Woodruff, David P (July 2024, Innovations in Theoretical Computer Science)

Metric embeddings traditionally study how to map n items to a target metric space such that distance lengths are not heavily distorted. However, what if we are only interested in preserving the relative order of the distances, rather than their exact lengths? In this paper, we explore the fundamental question: given triplet comparisons of the form “item i is closer to item j than to item k,” can we find low-dimensional Euclidean representations for the n items that respect those distance comparisons? Such order-preserving embeddings naturally arise in important applications—such as recommendations, ranking, crowdsourcing, psychometrics, and nearest-neighbor search—and have been studied since the 1950s under the name of ordinal or non-metric embeddings. Our main results include: Nearly-Tight Bounds on Triplet Dimension: We introduce the concept of triplet dimension of a dataset and show, surprisingly, that in order for an ordinal embedding to be triplet-preserving, its dimension needs to grow as n^2 in the worst case. This is nearly optimal, as n−1 dimensions always suffice. Tradeoffs for Dimension vs (Ordinal) Relaxation: We relax the requirement that every triplet must be exactly preserved and present almost tight lower bounds for the maximum ratio between distances whose relative order was inverted by the embedding. This ratio is known as (ordinal) relaxation in the literature and serves as a counterpart to (metric) distortion. New Bounds on Terminal and Top-k-NNs Embeddings: Moving beyond triplets, we study two well-motivated scenarios where we care about preserving specific sets of distances (not necessarily triplets). The first scenario is Terminal Ordinal Embeddings where we aim to preserve relative distance orders to k given items (the “terminals”), and for that, we present matching upper and lower bounds. The second scenario is top-k-NNs Ordinal Embeddings, where for each item we aim to preserve the relative order of its k nearest neighbors, for which we present lower bounds. To the best of our knowledge, these are some of the first tradeoffs on triplet-preserving ordinal embeddings and the first study of Terminal and Top-k-NNs Ordinal Embeddings.
more » « less
Full Text Available
Dimension-Accuracy Tradeoffs in Contrastive Embeddings for Triplets, Terminals & Top-k Nearest Neighbors

Chatziafratis, Vaggos; Indyk, Piotr (January 2024, Symposium on Simplicity in Algorithms)

Full Text Available

« Prev Next »

Search for: All records